Skip to content

feat(simd): BF16x16 + F16x16 SIMD vectors + slice ops (sprint W3-A)#126

Merged
AdaWorldAPI merged 1 commit into
masterfrom
claude/burn-W3A-half-simd
Apr 30, 2026
Merged

feat(simd): BF16x16 + F16x16 SIMD vectors + slice ops (sprint W3-A)#126
AdaWorldAPI merged 1 commit into
masterfrom
claude/burn-W3A-half-simd

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Closes parity items (2)+(3): half-precision SIMD vector types so burn's NdArrayElement::F16/BF16 enum variants can dispatch through ndarray's SIMD layer.

What ships:

  • src/simd_half.rs (691 LOC) — BF16x16 and F16x16 types, scalar dispatch (upcast f32 → op → downcast)
  • Slice ops: add_bf16_inplace, mul_bf16_inplace, add_f16_inplace, mul_f16_inplace, cast_*_to_*_batch (8 helpers)
  • Re-exports from src/simd.rs

Tests: 21 new, all passing. Total lib: 1817+ pass.

SIMD-accelerated paths (AVX2 emulation, AVX-512-BF16 native, NEON +fp16) are a follow-up. Scalar implementation is correct and portable — unblocks burn's NdArrayElement bound for half types.

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

…int W3-A)

Closes parity items 2 + 3. Scalar dispatch (upcast f32 -> op -> downcast).
SIMD-accelerated paths (AVX2 emulation, AVX-512-BF16 native, NEON +fp16)
are a follow-up. The scalar implementation is correct and portable, and
unblocks burn's NdArrayElement bound for half types.

- src/simd_half.rs: 691 LOC new module
- src/lib.rs: pub mod simd_half declaration
- src/simd.rs: re-exports

21 new tests, all passing. Total lib tests: 1817+ pass.

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
@AdaWorldAPI AdaWorldAPI merged commit 49cd860 into master Apr 30, 2026
5 of 14 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3358057a9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/simd.rs
Comment on lines +1233 to +1234
#[cfg(all(feature = "std", not(all(target_arch = "x86_64", target_feature = "avx512bf16"))))]
pub use crate::simd_half::BF16x16 as BF16x16;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep BF16x16 API stable across target features

crate::simd::BF16x16 now resolves to two incompatible types depending on compile flags: this line hides the new portable simd_half::BF16x16 when target_feature="avx512bf16" is set, so AVX-512-BF16 builds get simd_avx512::BF16x16 (unsafe load/convert-only API) instead of the new arithmetic API (from_slice, add, mul, copy_to_slice). Any consumer code written against the newly introduced BF16x16 methods will compile on scalar/NEON/AVX2 targets and fail on AVX-512-BF16 targets, which breaks the cross-target SIMD dispatch parity this change is meant to provide.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants